
**************************************************************
* Create a dataset with 2022 election participation
**************************************************************

clear all
odbc load, table("VD2022_AV") connectionstring("DRIVER={SQL Server};SERVER={mq02\b};DATABASE={P0846};Trusted_Connection={Yes}")

* Change names so the same code as for 2018 can be used
rename RD Rrost
rename Rostratt rostratt
rename P0846_Lopnr_PersonNr P0846_LopNr_PersonNr

* Swedish citizens only (rostratt == "3" are foregin born)
keep if  rostratt == "1" | rostratt == "2"
destring  Rrost, replace
drop if Rrost == .
tab Rrost 

* Keep only relevant variables
keep P0846_LopNr_PersonNr Rrost 

* Some individuals occur multiple times in data
* i) Keep only one case if they are identical on all variables
duplicates drop
* ii) Still 4 duplicates left (wrt P0846_LopNr_PersonNr only). 
* Keep the case where s/he voted
duplicates tag P0846_LopNr_PersonNr, gen(multiple_personid)
tab multiple_personid, miss
egen max_Rrost = max(Rrost), by(P0846_LopNr_PersonNr)
keep if max_Rrost == Rrost
keep P0846_LopNr_PersonNr Rrost

* Use better variable names
rename P0846_LopNr_PersonNr PersonId
rename Rrost Voted2022

compress
save "Data/data_voting_2022", replace




